Overview

Dataset statistics

Number of variables20
Number of observations24649092
Missing cells27888939
Missing cells (%)5.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.7 GiB
Average record size in memory160.0 B

Variable types

Numeric14
Categorical3
DateTime2
Boolean1

Alerts

airport_fee has constant value "0.0" Constant
VendorID is highly correlated with airport_feeHigh correlation
trip_distance is highly correlated with fare_amount and 1 other fieldsHigh correlation
payment_type is highly correlated with improvement_surchargeHigh correlation
fare_amount is highly correlated with total_amountHigh correlation
extra is highly correlated with mta_tax and 1 other fieldsHigh correlation
mta_tax is highly correlated with extra and 1 other fieldsHigh correlation
tip_amount is highly correlated with payment_typeHigh correlation
improvement_surcharge is highly correlated with payment_type and 1 other fieldsHigh correlation
total_amount is highly correlated with fare_amount and 2 other fieldsHigh correlation
store_and_fwd_flag is highly correlated with airport_feeHigh correlation
airport_fee is highly correlated with improvement_surcharge and 2 other fieldsHigh correlation
congestion_surcharge is highly correlated with improvement_surchargeHigh correlation
passenger_count has 809967 (3.3%) missing values Missing
RatecodeID has 809967 (3.3%) missing values Missing
store_and_fwd_flag has 809967 (3.3%) missing values Missing
congestion_surcharge has 809967 (3.3%) missing values Missing
airport_fee has 24649071 (> 99.9%) missing values Missing
trip_distance is highly skewed (γ1 = 738.5508544) Skewed
RatecodeID is highly skewed (γ1 = 104.9910041) Skewed
fare_amount is highly skewed (γ1 = 2856.264782) Skewed
extra is highly skewed (γ1 = 4963.651562) Skewed
mta_tax is highly skewed (γ1 = 4964.781006) Skewed
tip_amount is highly skewed (γ1 = 26.07406101) Skewed
tolls_amount is highly skewed (γ1 = 55.09558913) Skewed
total_amount is highly skewed (γ1 = 2523.557184) Skewed
passenger_count has 489385 (2.0%) zeros Zeros
trip_distance has 330110 (1.3%) zeros Zeros
payment_type has 809967 (3.3%) zeros Zeros
extra has 10273258 (41.7%) zeros Zeros
tip_amount has 7368621 (29.9%) zeros Zeros
tolls_amount has 23523820 (95.4%) zeros Zeros
congestion_surcharge has 2033937 (8.3%) zeros Zeros

Reproduction

Analysis started2022-09-30 03:33:06.320332
Analysis finished2022-09-30 04:36:45.245351
Duration1 hour, 3 minutes and 38.93 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

Distinct6405008
Distinct (%)26.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2047270.132
Minimum0
Maximum6405007
Zeros12
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size188.1 MiB
2022-09-30T01:36:45.593099image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile102704
Q1558443
median1340080
Q33271050.25
95-th percentile5735959.45
Maximum6405007
Range6405007
Interquartile range (IQR)2712607.25

Descriptive statistics

Standard deviation1822005.672
Coefficient of variation (CV)0.8899683748
Kurtosis-0.5348068312
Mean2047270.132
Median Absolute Deviation (MAD)1024731
Skewness0.8479502783
Sum5.046334982 × 1013
Variance3.319704668 × 1012
MonotonicityNot monotonic
2022-09-30T01:36:45.716536image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
012
 
< 0.1%
14069512
 
< 0.1%
16584912
 
< 0.1%
18222512
 
< 0.1%
13306512
 
< 0.1%
14944112
 
< 0.1%
10028112
 
< 0.1%
11665712
 
< 0.1%
6749712
 
< 0.1%
8387312
 
< 0.1%
Other values (6404998)24648972
> 99.9%
ValueCountFrequency (%)
012
< 0.1%
112
< 0.1%
212
< 0.1%
312
< 0.1%
412
< 0.1%
512
< 0.1%
612
< 0.1%
712
< 0.1%
812
< 0.1%
912
< 0.1%
ValueCountFrequency (%)
64050071
< 0.1%
64050061
< 0.1%
64050051
< 0.1%
64050041
< 0.1%
64050031
< 0.1%
64050021
< 0.1%
64050011
< 0.1%
64050001
< 0.1%
64049991
< 0.1%
64049981
< 0.1%

VendorID
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size188.1 MiB
2
16599044 
1
8004823 
6
 
45097
5
 
128

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters24649092
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row1
4th row2
5th row1

Common Values

ValueCountFrequency (%)
216599044
67.3%
18004823
32.5%
645097
 
0.2%
5128
 
< 0.1%

Length

2022-09-30T01:36:45.835303image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-30T01:36:45.981761image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
216599044
67.3%
18004823
32.5%
645097
 
0.2%
5128
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
216599044
67.3%
18004823
32.5%
645097
 
0.2%
5128
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number24649092
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
216599044
67.3%
18004823
32.5%
645097
 
0.2%
5128
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common24649092
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
216599044
67.3%
18004823
32.5%
645097
 
0.2%
5128
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII24649092
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
216599044
67.3%
18004823
32.5%
645097
 
0.2%
5128
 
< 0.1%
Distinct11776036
Distinct (%)47.8%
Missing0
Missing (%)0.0%
Memory size188.1 MiB
Minimum2002-12-31 23:06:55
Maximum2021-06-10 10:10:48
2022-09-30T01:36:46.086839image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:36:46.206310image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct11776414
Distinct (%)47.8%
Missing0
Missing (%)0.0%
Memory size188.1 MiB
Minimum2002-12-31 23:08:03
Maximum2021-06-10 10:41:42
2022-09-30T01:36:46.329218image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:36:46.439516image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

passenger_count
Real number (ℝ≥0)

MISSING
ZEROS

Distinct10
Distinct (%)< 0.1%
Missing809967
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean1.467982655
Minimum0
Maximum9
Zeros489385
Zeros (%)2.0%
Negative0
Negative (%)0.0%
Memory size188.1 MiB
2022-09-30T01:36:46.542466image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile5
Maximum9
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.112779455
Coefficient of variation (CV)0.7580331083
Kurtosis6.349505884
Mean1.467982655
Median Absolute Deviation (MAD)0
Skewness2.556566397
Sum34995422
Variance1.238278114
MonotonicityNot monotonic
2022-09-30T01:36:46.617001image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
117511386
71.0%
23349141
 
13.6%
3872659
 
3.5%
5751719
 
3.0%
0489385
 
2.0%
6474541
 
1.9%
4390094
 
1.6%
791
 
< 0.1%
858
 
< 0.1%
951
 
< 0.1%
(Missing)809967
 
3.3%
ValueCountFrequency (%)
0489385
 
2.0%
117511386
71.0%
23349141
 
13.6%
3872659
 
3.5%
4390094
 
1.6%
5751719
 
3.0%
6474541
 
1.9%
791
 
< 0.1%
858
 
< 0.1%
951
 
< 0.1%
ValueCountFrequency (%)
951
 
< 0.1%
858
 
< 0.1%
791
 
< 0.1%
6474541
 
1.9%
5751719
 
3.0%
4390094
 
1.6%
3872659
 
3.5%
23349141
 
13.6%
117511386
71.0%
0489385
 
2.0%

trip_distance
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct7375
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.527101448
Minimum-30.62
Maximum350914.89
Zeros330110
Zeros (%)1.3%
Negative2338
Negative (%)< 0.1%
Memory size188.1 MiB
2022-09-30T01:36:46.717961image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-30.62
5-th percentile0.45
Q10.99
median1.65
Q33
95-th percentile10.38
Maximum350914.89
Range350945.51
Interquartile range (IQR)2.01

Descriptive statistics

Standard deviation325.0319578
Coefficient of variation (CV)92.15271025
Kurtosis649867.1222
Mean3.527101448
Median Absolute Deviation (MAD)0.84
Skewness738.5508544
Sum86939848.09
Variance105645.7736
MonotonicityNot monotonic
2022-09-30T01:36:46.829777image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.9433901
 
1.8%
0.8432751
 
1.8%
1427480
 
1.7%
1.1410856
 
1.7%
0.7409927
 
1.7%
1.2391452
 
1.6%
1.3369497
 
1.5%
0.6362629
 
1.5%
1.4346212
 
1.4%
0330110
 
1.3%
Other values (7365)20734277
84.1%
ValueCountFrequency (%)
-30.622
< 0.1%
-29.471
 
< 0.1%
-29.231
 
< 0.1%
-29.12
< 0.1%
-29.091
 
< 0.1%
-29.072
< 0.1%
-29.063
< 0.1%
-27.971
 
< 0.1%
-27.322
< 0.1%
-27.272
< 0.1%
ValueCountFrequency (%)
350914.891
< 0.1%
350814.141
< 0.1%
350793.61
< 0.1%
350722.341
< 0.1%
350696.981
< 0.1%
350104.581
< 0.1%
349987.051
< 0.1%
349692.31
< 0.1%
297004.511
< 0.1%
275196.591
< 0.1%

RatecodeID
Real number (ℝ≥0)

MISSING
SKEWED

Distinct7
Distinct (%)< 0.1%
Missing809967
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean1.048557361
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size188.1 MiB
2022-09-30T01:36:46.928200image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile1
Maximum99
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.761083543
Coefficient of variation (CV)0.7258387297
Kurtosis13408.51976
Mean1.048557361
Median Absolute Deviation (MAD)0
Skewness104.9910041
Sum24996690
Variance0.5792481594
MonotonicityNot monotonic
2022-09-30T01:36:47.000294image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
123231656
94.2%
2428170
 
1.7%
5119963
 
0.5%
339440
 
0.2%
418581
 
0.1%
991165
 
< 0.1%
6150
 
< 0.1%
(Missing)809967
 
3.3%
ValueCountFrequency (%)
123231656
94.2%
2428170
 
1.7%
339440
 
0.2%
418581
 
0.1%
5119963
 
0.5%
6150
 
< 0.1%
991165
 
< 0.1%
ValueCountFrequency (%)
991165
 
< 0.1%
6150
 
< 0.1%
5119963
 
0.5%
418581
 
0.1%
339440
 
0.2%
2428170
 
1.7%
123231656
94.2%

store_and_fwd_flag
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing809967
Missing (%)3.3%
Memory size47.0 MiB
False
23593792 
True
 
245333
(Missing)
 
809967
ValueCountFrequency (%)
False23593792
95.7%
True245333
 
1.0%
(Missing)809967
 
3.3%
2022-09-30T01:36:47.084352image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

PULocationID
Real number (ℝ≥0)

Distinct262
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean163.9707148
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size188.1 MiB
2022-09-30T01:36:47.178790image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile48
Q1114
median162
Q3234
95-th percentile261
Maximum265
Range264
Interquartile range (IQR)120

Descriptive statistics

Standard deviation66.75225692
Coefficient of variation (CV)0.4070986517
Kurtosis-0.9337534246
Mean163.9707148
Median Absolute Deviation (MAD)67
Skewness-0.2776446138
Sum4041729235
Variance4455.863804
MonotonicityNot monotonic
2022-09-30T01:36:47.447594image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2371145412
 
4.6%
2361089583
 
4.4%
161946854
 
3.8%
186862321
 
3.5%
162831592
 
3.4%
170754603
 
3.1%
142747187
 
3.0%
48730150
 
3.0%
239704016
 
2.9%
141688205
 
2.8%
Other values (252)16149169
65.5%
ValueCountFrequency (%)
12114
 
< 0.1%
225
 
< 0.1%
31865
 
< 0.1%
435561
0.1%
5109
 
< 0.1%
6142
 
< 0.1%
734971
0.1%
8263
 
< 0.1%
91234
 
< 0.1%
108864
 
< 0.1%
ValueCountFrequency (%)
26557887
 
0.2%
264166696
 
0.7%
263573815
2.3%
262361985
1.5%
261110680
 
0.4%
26012824
 
0.1%
2592706
 
< 0.1%
2582469
 
< 0.1%
2571510
 
< 0.1%
25610804
 
< 0.1%

DOLocationID
Real number (ℝ≥0)

Distinct263
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean161.1702657
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size188.1 MiB
2022-09-30T01:36:47.563467image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile42
Q1107
median162
Q3234
95-th percentile260
Maximum265
Range264
Interquartile range (IQR)127

Descriptive statistics

Standard deviation70.95646814
Coefficient of variation (CV)0.4402578094
Kurtosis-1.014689305
Mean161.1702657
Median Absolute Deviation (MAD)69
Skewness-0.3129428061
Sum3972700708
Variance5034.82037
MonotonicityNot monotonic
2022-09-30T01:36:47.676211image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2361126419
 
4.6%
2371015502
 
4.1%
161843739
 
3.4%
170738636
 
3.0%
141686809
 
2.8%
142670973
 
2.7%
239667836
 
2.7%
162666196
 
2.7%
48640791
 
2.6%
238608220
 
2.5%
Other values (253)16983971
68.9%
ValueCountFrequency (%)
134575
 
0.1%
245
 
< 0.1%
33808
 
< 0.1%
4109252
0.4%
5312
 
< 0.1%
6453
 
< 0.1%
790402
0.4%
8528
 
< 0.1%
92903
 
< 0.1%
1020458
 
0.1%
ValueCountFrequency (%)
26556919
 
0.2%
264152443
 
0.6%
263541634
2.2%
262380024
1.5%
26193041
 
0.4%
26028701
 
0.1%
2595879
 
< 0.1%
2587347
 
< 0.1%
25710210
 
< 0.1%
25650300
 
0.2%

payment_type
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.23833101
Minimum0
Maximum5
Zeros809967
Zeros (%)3.3%
Negative0
Negative (%)0.0%
Memory size188.1 MiB
2022-09-30T01:36:47.767323image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q32
95-th percentile2
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.5282318102
Coefficient of variation (CV)0.4265675379
Kurtosis2.323307471
Mean1.23833101
Median Absolute Deviation (MAD)0
Skewness0.9545461655
Sum30523735
Variance0.2790288453
MonotonicityNot monotonic
2022-09-30T01:36:47.845002image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
117463775
70.8%
26148485
 
24.9%
0809967
 
3.3%
3144485
 
0.6%
482365
 
0.3%
515
 
< 0.1%
ValueCountFrequency (%)
0809967
 
3.3%
117463775
70.8%
26148485
 
24.9%
3144485
 
0.6%
482365
 
0.3%
515
 
< 0.1%
ValueCountFrequency (%)
515
 
< 0.1%
482365
 
0.3%
3144485
 
0.6%
26148485
 
24.9%
117463775
70.8%
0809967
 
3.3%

fare_amount
Real number (ℝ)

HIGH CORRELATION
SKEWED

Distinct10302
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.66777977
Minimum-1259
Maximum998310.03
Zeros10994
Zeros (%)< 0.1%
Negative92833
Negative (%)0.4%
Memory size188.1 MiB
2022-09-30T01:36:47.947417image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-1259
5-th percentile4
Q16.5
median9
Q314
95-th percentile35.5
Maximum998310.03
Range999569.03
Interquartile range (IQR)7.5

Descriptive statistics

Standard deviation274.0881567
Coefficient of variation (CV)21.63663733
Kurtosis9036085.551
Mean12.66777977
Median Absolute Deviation (MAD)3.5
Skewness2856.264782
Sum312249269
Variance75124.31762
MonotonicityNot monotonic
2022-09-30T01:36:48.067319image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
61297558
 
5.3%
6.51270995
 
5.2%
5.51261888
 
5.1%
71242564
 
5.0%
51172501
 
4.8%
7.51161815
 
4.7%
81105710
 
4.5%
8.51021014
 
4.1%
4.5956452
 
3.9%
9949970
 
3.9%
Other values (10292)13208625
53.6%
ValueCountFrequency (%)
-12591
< 0.1%
-12381
< 0.1%
-7501
< 0.1%
-7301
< 0.1%
-5002
< 0.1%
-4971
< 0.1%
-4901
< 0.1%
-4801
< 0.1%
-4501
< 0.1%
-4451
< 0.1%
ValueCountFrequency (%)
998310.031
< 0.1%
671100.141
< 0.1%
429496.721
< 0.1%
398464.881
< 0.1%
187438.961
< 0.1%
151504.451
< 0.1%
69641
< 0.1%
60521
< 0.1%
42651
< 0.1%
3014.51
< 0.1%

extra
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct447
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.071846049
Minimum-27
Maximum500000.8
Zeros10273258
Zeros (%)41.7%
Negative41620
Negative (%)0.2%
Memory size188.1 MiB
2022-09-30T01:36:48.187547image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-27
5-th percentile0
Q10
median0.5
Q32.5
95-th percentile3.5
Maximum500000.8
Range500027.8
Interquartile range (IQR)2.5

Descriptive statistics

Standard deviation100.7169073
Coefficient of variation (CV)93.96583341
Kurtosis24641587.97
Mean1.071846049
Median Absolute Deviation (MAD)0.5
Skewness4963.651562
Sum26420031.88
Variance10143.89542
MonotonicityNot monotonic
2022-09-30T01:36:48.303787image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
010273258
41.7%
2.54187100
17.0%
0.53921268
 
15.9%
13007285
 
12.2%
31606342
 
6.5%
3.51414608
 
5.7%
2.75106280
 
0.4%
4.559976
 
0.2%
-0.525945
 
0.1%
720393
 
0.1%
Other values (437)26637
 
0.1%
ValueCountFrequency (%)
-271
 
< 0.1%
-26.51
 
< 0.1%
-17.691
 
< 0.1%
-72
 
< 0.1%
-4.5846
< 0.1%
-3.56
 
< 0.1%
-37
 
< 0.1%
-2.517
 
< 0.1%
-225
 
< 0.1%
-1.31
 
< 0.1%
ValueCountFrequency (%)
500000.81
 
< 0.1%
113.011
 
< 0.1%
90.063
< 0.1%
87.567
< 0.1%
65.531
 
< 0.1%
55.791
 
< 0.1%
52.51
 
< 0.1%
47.41
 
< 0.1%
42.51
 
< 0.1%
36.091
 
< 0.1%

mta_tax
Real number (ℝ)

HIGH CORRELATION
SKEWED

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.51279657
Minimum-0.5
Maximum500000.5
Zeros188223
Zeros (%)0.8%
Negative90730
Negative (%)0.4%
Memory size188.1 MiB
2022-09-30T01:36:48.404288image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-0.5
5-th percentile0.5
Q10.5
median0.5
Q30.5
95-th percentile0.5
Maximum500000.5
Range500001
Interquartile range (IQR)0

Descriptive statistics

Standard deviation100.7093215
Coefficient of variation (CV)196.3923462
Kurtosis24649064.29
Mean0.51279657
Median Absolute Deviation (MAD)0
Skewness4964.781006
Sum12639969.83
Variance10142.36743
MonotonicityNot monotonic
2022-09-30T01:36:48.497051image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
0.524370052
98.9%
0188223
 
0.8%
-0.590730
 
0.4%
3.342
 
< 0.1%
0.3510
 
< 0.1%
0.328
 
< 0.1%
1.16
 
< 0.1%
33
 
< 0.1%
30.82
 
< 0.1%
2.52
 
< 0.1%
Other values (14)14
 
< 0.1%
ValueCountFrequency (%)
-0.590730
 
0.4%
0188223
 
0.8%
0.328
 
< 0.1%
0.3510
 
< 0.1%
0.524370052
98.9%
0.591
 
< 0.1%
0.831
 
< 0.1%
0.91
 
< 0.1%
1.16
 
< 0.1%
1.151
 
< 0.1%
ValueCountFrequency (%)
500000.51
 
< 0.1%
39.511
 
< 0.1%
30.82
 
< 0.1%
18.491
 
< 0.1%
6.81
 
< 0.1%
3.342
< 0.1%
3.251
 
< 0.1%
33
 
< 0.1%
2.81
 
< 0.1%
2.741
 
< 0.1%

tip_amount
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct5196
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.082026641
Minimum-493.22
Maximum1393.56
Zeros7368621
Zeros (%)29.9%
Negative1013
Negative (%)< 0.1%
Memory size188.1 MiB
2022-09-30T01:36:48.603671image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-493.22
5-th percentile0
Q10
median1.92
Q32.76
95-th percentile5.86
Maximum1393.56
Range1886.78
Interquartile range (IQR)2.76

Descriptive statistics

Standard deviation2.610752953
Coefficient of variation (CV)1.253947909
Kurtosis7264.577101
Mean2.082026641
Median Absolute Deviation (MAD)1.16
Skewness26.07406101
Sum51320066.22
Variance6.816030981
MonotonicityNot monotonic
2022-09-30T01:36:48.709296image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
07368621
29.9%
11373765
 
5.6%
2934709
 
3.8%
2.75481415
 
2.0%
2.06342562
 
1.4%
1.96338258
 
1.4%
2.16329021
 
1.3%
1.5328898
 
1.3%
1.86327093
 
1.3%
2.26315058
 
1.3%
Other values (5186)12509692
50.8%
ValueCountFrequency (%)
-493.221
< 0.1%
-1111
< 0.1%
-103.061
< 0.1%
-931
< 0.1%
-911
< 0.1%
-87.771
< 0.1%
-871
< 0.1%
-701
< 0.1%
-68.021
< 0.1%
-631
< 0.1%
ValueCountFrequency (%)
1393.561
< 0.1%
11001
< 0.1%
10011
< 0.1%
8001
< 0.1%
5931
< 0.1%
5501
< 0.1%
549.021
< 0.1%
5002
< 0.1%
493.221
< 0.1%
4801
< 0.1%

tolls_amount
Real number (ℝ)

SKEWED
ZEROS

Distinct1915
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3038689531
Minimum-40
Maximum925.5
Zeros23523820
Zeros (%)95.4%
Negative1744
Negative (%)< 0.1%
Memory size188.1 MiB
2022-09-30T01:36:48.827888image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-40
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum925.5
Range965.5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.604901844
Coefficient of variation (CV)5.281559132
Kurtosis24424.79912
Mean0.3038689531
Median Absolute Deviation (MAD)0
Skewness55.09558913
Sum7490093.78
Variance2.575709928
MonotonicityNot monotonic
2022-09-30T01:36:48.938303image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
023523820
95.4%
6.121010858
 
4.1%
2.818663
 
0.1%
11.7515333
 
0.1%
12.2414554
 
0.1%
13.7511823
 
< 0.1%
2.295450
 
< 0.1%
18.363620
 
< 0.1%
8.412381
 
< 0.1%
18.751574
 
< 0.1%
Other values (1905)41016
 
0.2%
ValueCountFrequency (%)
-401
< 0.1%
-38.231
< 0.1%
-35.741
< 0.1%
-32.741
< 0.1%
-301
< 0.1%
-29.621
< 0.1%
-28.751
< 0.1%
-27.52
< 0.1%
-271
< 0.1%
-25.992
< 0.1%
ValueCountFrequency (%)
925.51
< 0.1%
911.751
< 0.1%
910.51
< 0.1%
853.551
< 0.1%
831.751
< 0.1%
700.871
< 0.1%
6121
< 0.1%
601.021
< 0.1%
600.041
< 0.1%
6001
< 0.1%

improvement_surcharge
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size188.1 MiB
0.3
24540712 
-0.3
 
92463
0.0
 
15917

Length

Max length4
Median length3
Mean length3.003751173
Min length3

Characters and Unicode

Total characters74039739
Distinct characters4
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.3
2nd row0.3
3rd row0.3
4th row0.3
5th row0.3

Common Values

ValueCountFrequency (%)
0.324540712
99.6%
-0.392463
 
0.4%
0.015917
 
0.1%

Length

2022-09-30T01:36:49.033989image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-30T01:36:49.119491image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0.324633175
99.9%
0.015917
 
0.1%

Most occurring characters

ValueCountFrequency (%)
024665009
33.3%
.24649092
33.3%
324633175
33.3%
-92463
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number49298184
66.6%
Other Punctuation24649092
33.3%
Dash Punctuation92463
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
024665009
50.0%
324633175
50.0%
Other Punctuation
ValueCountFrequency (%)
.24649092
100.0%
Dash Punctuation
ValueCountFrequency (%)
-92463
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common74039739
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
024665009
33.3%
.24649092
33.3%
324633175
33.3%
-92463
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII74039739
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
024665009
33.3%
.24649092
33.3%
324633175
33.3%
-92463
 
0.1%

total_amount
Real number (ℝ)

HIGH CORRELATION
SKEWED

Distinct18162
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.4217469
Minimum-1260.3
Maximum1000003.8
Zeros7457
Zeros (%)< 0.1%
Negative92683
Negative (%)0.4%
Memory size188.1 MiB
2022-09-30T01:36:49.219063image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-1260.3
5-th percentile8.15
Q111.16
median14.3
Q319.8
95-th percentile45.88
Maximum1000003.8
Range1001264.1
Interquartile range (IQR)8.64

Descriptive statistics

Standard deviation340.2244935
Coefficient of variation (CV)18.46863358
Kurtosis6833943.296
Mean18.4217469
Median Absolute Deviation (MAD)3.98
Skewness2523.557184
Sum454079334.2
Variance115752.706
MonotonicityNot monotonic
2022-09-30T01:36:49.343626image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.8478113
 
1.9%
10.3475155
 
1.9%
9.3472581
 
1.9%
10.8460624
 
1.9%
8.8447660
 
1.8%
11.3439171
 
1.8%
11.8417338
 
1.7%
12.3392620
 
1.6%
8.3385668
 
1.6%
12.8369603
 
1.5%
Other values (18152)20310559
82.4%
ValueCountFrequency (%)
-1260.31
< 0.1%
-1242.31
< 0.1%
-750.31
< 0.1%
-730.31
< 0.1%
-502.81
< 0.1%
-502.021
< 0.1%
-500.31
< 0.1%
-497.31
< 0.1%
-490.31
< 0.1%
-480.81
< 0.1%
ValueCountFrequency (%)
1000003.81
< 0.1%
998325.611
< 0.1%
671103.171
< 0.1%
429562.251
< 0.1%
398467.71
< 0.1%
187443.261
< 0.1%
151522.071
< 0.1%
8361.361
< 0.1%
6061.421
< 0.1%
4268.31
< 0.1%

congestion_surcharge
Real number (ℝ)

HIGH CORRELATION
MISSING
ZEROS

Distinct12
Distinct (%)< 0.1%
Missing809967
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean2.271167075
Minimum-2.5
Maximum3
Zeros2033937
Zeros (%)8.3%
Negative74015
Negative (%)0.3%
Memory size188.1 MiB
2022-09-30T01:36:49.443745image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-2.5
5-th percentile0
Q12.5
median2.5
Q32.5
95-th percentile2.5
Maximum3
Range5.5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.7473420881
Coefficient of variation (CV)0.3290564117
Kurtosis9.442761087
Mean2.271167075
Median Absolute Deviation (MAD)0
Skewness-3.176351933
Sum54142635.8
Variance0.5585201967
MonotonicityNot monotonic
2022-09-30T01:36:49.515822image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2.521730865
88.2%
02033937
 
8.3%
-2.574011
 
0.3%
0.75160
 
< 0.1%
2.75134
 
< 0.1%
0.55
 
< 0.1%
14
 
< 0.1%
-0.754
 
< 0.1%
1.52
 
< 0.1%
31
 
< 0.1%
Other values (2)2
 
< 0.1%
(Missing)809967
 
3.3%
ValueCountFrequency (%)
-2.574011
 
0.3%
-0.754
 
< 0.1%
02033937
 
8.3%
0.55
 
< 0.1%
0.75160
 
< 0.1%
0.81
 
< 0.1%
14
 
< 0.1%
1.52
 
< 0.1%
21
 
< 0.1%
2.521730865
88.2%
ValueCountFrequency (%)
31
 
< 0.1%
2.75134
 
< 0.1%
2.521730865
88.2%
21
 
< 0.1%
1.52
 
< 0.1%
14
 
< 0.1%
0.81
 
< 0.1%
0.75160
 
< 0.1%
0.55
 
< 0.1%
02033937
 
8.3%

airport_fee
Categorical

CONSTANT
HIGH CORRELATION
MISSING
REJECTED

Distinct1
Distinct (%)4.8%
Missing24649071
Missing (%)> 99.9%
Memory size188.1 MiB
0.0
21 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters63
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.021
 
< 0.1%
(Missing)24649071
> 99.9%

Length

2022-09-30T01:36:49.605646image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-30T01:36:49.694564image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0.021
100.0%

Most occurring characters

ValueCountFrequency (%)
042
66.7%
.21
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number42
66.7%
Other Punctuation21
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
042
100.0%
Other Punctuation
ValueCountFrequency (%)
.21
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common63
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
042
66.7%
.21
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII63
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
042
66.7%
.21
33.3%

Interactions

2022-09-30T01:30:03.366112image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:09:45.560431image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:11:15.569251image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:12:42.518703image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:14:09.415383image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:15:33.489645image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:17:03.536467image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:18:33.527086image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:20:22.800597image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:22:00.296887image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:23:37.065511image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:25:18.154476image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:26:55.929340image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:28:28.748800image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:30:10.055241image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:09:51.899325image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:11:21.566618image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:12:48.598497image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:14:15.494800image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:15:39.874687image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:17:09.821633image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:18:39.565630image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:20:30.306384image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:22:06.498094image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:23:44.049150image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:25:25.468716image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:27:02.074313image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:28:35.327804image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:30:16.556742image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:09:58.768224image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:11:27.669291image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:12:54.662697image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:14:21.445202image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:15:46.448535image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:17:16.361651image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:18:46.202754image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:20:38.065337image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:22:12.834961image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:23:51.301560image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:25:32.887814image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:27:08.373650image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:28:42.279168image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:30:22.929838image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:10:04.772934image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:11:33.765651image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:13:00.690042image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:14:27.179756image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:15:52.748301image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:17:22.653919image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:18:52.464931image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:20:45.878110image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:22:19.013606image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:23:58.259531image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:25:39.991272image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:27:14.441341image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:28:48.938444image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:30:29.724812image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:10:11.025289image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:11:40.063825image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:13:07.009297image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:14:33.225495image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:15:58.940018image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:17:28.968366image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:18:59.406335image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:20:53.955937image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:22:25.962714image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:24:05.612153image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:25:47.563515image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:27:20.872228image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:28:55.439047image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:30:36.591465image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:10:17.254639image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:11:46.339370image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:13:13.392919image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:14:39.220649image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:16:05.315702image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:17:35.176007image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:19:06.827213image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:21:01.927059image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:22:33.301413image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:24:13.029007image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:25:54.536197image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:27:27.630273image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:29:02.023153image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:30:43.442495image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:10:23.533587image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:11:52.698492image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:13:19.704202image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:14:45.245923image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:16:11.617408image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:17:41.485019image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:19:15.099684image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:21:08.806839image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:22:40.587747image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:24:20.436177image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:26:01.555216image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:27:34.187599image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:29:09.320029image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:30:50.191075image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:10:30.057088image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:11:58.842518image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:13:25.960528image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:14:51.187784image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:16:18.095903image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:17:48.093615image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:19:23.678593image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:21:14.907638image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:22:47.517724image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:24:27.681122image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:26:08.508612image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:27:40.585029image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:29:16.198526image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:30:57.171883image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:10:36.522477image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:12:04.981075image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:13:32.241389image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:14:57.148527image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:16:24.611098image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:17:54.746173image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:19:32.540773image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:21:21.357443image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:22:54.232365image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:24:34.914739image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:26:15.930241image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:27:47.432899image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:29:22.889285image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:31:04.045221image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:10:42.898790image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:12:11.144417image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:13:38.495247image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:15:03.105111image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:16:31.136281image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:18:01.318888image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:19:41.126184image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:21:27.882293image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:23:01.160250image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:24:41.956919image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:26:23.475522image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:27:54.457472image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:29:29.567826image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:31:11.239440image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:10:49.855662image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:12:17.254537image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:13:44.761358image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:15:09.107903image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:16:37.695153image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:18:07.812812image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:19:49.653456image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:21:34.350948image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:23:07.817609image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:24:49.181496image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:26:30.049046image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:28:02.034661image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:29:36.346397image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:31:17.790215image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:10:56.211791image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:12:23.534775image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:13:50.980383image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:15:15.090744image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:16:44.301203image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:18:14.361098image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:19:58.332748image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:21:40.904931image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:23:14.757184image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:24:56.556792image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:26:36.563687image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:28:09.152858image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:29:43.226230image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:31:24.032983image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:11:03.020327image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:12:29.702827image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:13:57.274725image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:15:21.067441image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:16:50.847551image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:18:20.868179image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:20:06.927489image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:21:47.434274image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:23:22.461720image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:25:03.842592image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:26:43.043908image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:28:15.777379image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:29:49.791724image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:31:29.810742image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:11:09.297892image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:12:36.031083image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:14:03.306992image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:15:27.041028image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:16:57.204488image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:18:27.217449image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:20:14.729394image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:21:53.711875image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:23:29.658828image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:25:10.792721image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:26:49.381469image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:28:21.992696image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-30T01:29:56.499944image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-09-30T01:36:49.849329image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-30T01:36:50.087791image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-30T01:36:50.274161image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-30T01:36:50.443074image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-30T01:36:50.569603image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-30T01:31:35.212078image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-30T01:32:45.907044image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-30T01:35:34.896901image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-09-30T01:35:53.805945image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexVendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeairport_fee
0012020-03-01 00:31:132020-03-01 01:01:421.04.701.0N88255122.03.00.52.000.00.327.802.5None
1122020-03-01 00:08:222020-03-01 00:08:491.00.001.0N19319322.50.50.50.000.00.33.800.0None
2212020-03-01 00:52:182020-03-01 00:59:161.01.101.0N2469016.03.00.51.950.00.311.752.5None
3322020-03-01 00:47:532020-03-01 00:50:572.00.871.0N15123815.00.50.51.760.00.310.562.5None
4412020-03-01 00:43:192020-03-01 00:58:270.04.401.0N79261116.53.00.54.050.00.324.352.5None
5512020-03-01 00:04:432020-03-01 00:23:171.03.501.0Y113142115.03.00.53.750.00.322.552.5None
6612020-03-01 00:43:212020-03-01 01:14:361.014.101.0Y23714140.53.00.58.850.00.353.152.5None
7712020-03-01 00:51:352020-03-01 01:00:171.01.001.0N23411417.03.00.51.300.00.312.102.5None
8812020-03-01 00:13:422020-03-01 00:23:004.01.101.0N14821117.53.00.52.000.00.313.302.5None
9912020-03-01 00:25:052020-03-01 00:31:062.01.301.0N21124916.53.00.52.000.00.312.302.5None

Last rows

df_indexVendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeairport_fee
2464908254978722020-06-30 23:30:592020-07-01 00:00:40NaN8.32NaNNone165232045.650.00.52.750.000.349.20NaNNone
2464908354978822020-06-30 23:16:232020-07-01 00:10:03NaN44.49NaNNone442590124.580.00.52.756.120.3134.25NaNNone
2464908454978922020-06-30 23:20:222020-06-30 23:31:10NaN3.30NaNNone107256023.560.00.52.750.000.327.11NaNNone
2464908554979022020-06-30 23:54:002020-06-30 23:59:00NaN1.85NaNNone506808.140.00.52.470.000.313.91NaNNone
2464908654979122020-06-30 23:42:002020-06-30 23:58:00NaN3.10NaNNone3672013.810.00.54.940.000.319.55NaNNone
2464908754979222020-06-30 23:05:002020-06-30 23:32:00NaN12.96NaNNone1769032.910.00.52.756.120.342.58NaNNone
2464908854979322020-06-30 23:21:472020-06-30 23:25:24NaN0.36NaNNone4141011.450.00.52.750.000.315.00NaNNone
2464908954979422020-06-30 23:34:002020-06-30 23:44:00NaN2.36NaNNone24281018.450.00.52.750.000.322.00NaNNone
2464909054979522020-06-30 23:22:472020-06-30 23:42:01NaN5.50NaNNone14118015.900.00.56.2312.240.335.17NaNNone
2464909154979622020-06-30 23:56:182020-07-01 00:27:19NaN9.59NaNNone61137029.680.00.50.000.000.332.98NaNNone